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Abstract. No proven method is currently available for the 
reliable short time prediction of earthquakes (minutes to 
months). However, it is possible to make probabilistic haz- 
ard assessments for earthquake risk. These are primarily 
based on the association of small earthquakes with future 
large earthquakes. In this paper we discuss a new approach 
to earthquake forecasting. This approach is based on a pat- 
tern informatics (PI) method which quantifies temporal vari- 
ations in seismicity. The output is a map of areas in a seis- 
mogenic region ("hotspots") where earthquakes are forecast 
to occur in a future 10-year time span. This approach has 
been successfully applied to California, to Japan, and on a 
worldwide basis. These forecasts are binary-an earthquake 
is forecast either to occur or to not occur The standard ap- 
proach to the evaluation of a binary forecast is the use of the 
relative operating characteristic (ROC) diagram, which is a 
more restrictive test and less subject to bias than maximum 
likelihood tests. To test our PI method, we made two types 
of retrospective forecasts for California. The first is the PI 
method and the second is a relative intensity (RI) forecast 
based on the hypothesis that future earthquakes will occur 
where earthquakes have occurred in the recent past. While 
both retrospective forecasts are for the ten year period 1 Jan- 
uary 2000 to 3 1 December 2009, we performed an interim 
analysis 5 years into the forecast. The PI method out per- 
forms the RI method under most circumstances. 



1 Introduction 

Earthquakes are the most feared of natural hazards because 
they occur without warning. Hurricanes can be tracked, 
floods develop gradually, and volcanic eruptions are pre- 
ceded by a variety of precursory phenomena. Earthquakes, 
however, generally occur without any warning. There have 
been a wide variety of approaches applied to the forecast- 
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ing o f earthquakes ( Mogil ll985t TurcottS Il99lt iLomnitzl 
19941 iKeilis-Borokl l2002HScholzl 120021: iKanamorii l2003h 

These approaches can be divided into two general classes; 
the first is based on empirical observations of precursory 
changes. Examples include precursory seismic activity, pre- 
cursory ground motions, and many others. The second ap- 
proach is based on statistical patterns of seismicity. Neither 
approach has been able to provide reliable short-term fore- 
casts (days to months) on a consistent basis. 

Although short-term predictions are not available, long- 
term seismic-hazard assessments can be made. A large frac- 
tion of all earthquakes occur in the vicinity of plate bound- 
aries, although some do occur in plate interiors. It is also 
possible to assess the long-term probability of having an 
earthquake of a specified magnitude in a specified region. 
These assessments are primarily based on the hypothesis that 
future earthquakes will occur in region s where past earth- 
quakes have occurred jprankell Il995t IKossoboko v et all 
^000). Specifically, the rate of occurrence of small earth- 
quakes in a region can be analyzed to assess the probability 
of occurrence of much larger earthquakes. 

The principal focus of this paper i s a new approach to 
earthquake forecasting jRundle et all l2002t iTiampo et all 
l2002b'a':'Run dle et all 12003V Our method does not predict 
earthquakes, but it does forecast the regions (hotspots) where 
earthquakes are most likely to occur in the relatively near fu- 
ture (typically ten years). The objective is to reduce the ar- 
eas of earthquake risk relative to those given by long-term 
hazard assessments. Our approach is based on pattern infor- 
matics (PI), a technique that quantifies temporal variations in 
seismicity patterns. The result is a map of areas in a seismo- 
genic region (hotspots) where earthquakes are likely to occur 
during a specified period in the future. A forecast for Califor- 
nia was published by our group in 2002(Rundle etal.,200l. 
Subsequently, fifteen of the seventeen California earthquakes 
with magnitudes M>5 occurred in or immediately adjacent 
to the resulting hotspots. A forecast for Japan, presented in 
Tokyo in early October 2004, successfully forecast the loca- 
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October 2004. A global forecast, presented at the early De- 
cember 2004 meeting of the American Geophysical Union, 
successfully forecast the locations of the 23 December 2004, 
M=8.1 Macquarie Island earthquake, and the 26 December 
2004 M=9.0 Sumatra earthquake. Before presenting further 
details of these studies we will give a brief overview of the 
current state of earthquake prediction and forecasting. 

2 Empirical approaches 

Empirical approaches to earthquake prediction rely on lo- 
cal observations of precursory phenomena in the vicinity of 
the earthquake to be predicted. It has been suggested that 
one or more of the fol lowing phenomena may indicate a fu- 
ture earthqu ake ( Mogi tI 1 985pTurcotteLI 1 99 ltlLomnitzl[l994l 
iKgiHs-Borok, 2002; Scholz, '2002; 'Kanamori', '2003"): 1 ) pre- 
cursory increase or decrease in seismicity in the vicinity of 
the origin of a future earthquake rupture, 2) precursory fault 
slip that leads to surface tilt and/or displacements, 3) electro- 
magnetic signals, 4) chemical emissions, and 5) changes in 
animal behavior 

Examples of successful near-term predictions of future 
earthquakes have been rare. A notable exception was the 
prediction of the M=7.3 Haicheng earthquake in northeast 
China that occurred on 4 February 1975. This prediction led 
to the evacuation of the city which undoubtedly saved many 
lives. The Chinese reported that the successful prediction 
was based on foreshocks, groundwater anomalies, and an- 
imal behavior. Unfortunately, a similar prediction was not 
made prior to the magnitude M=7.8 Tang shan earthquake 
that occurred on 28 July 1976 (Utsu, '2003V Official reports 
placed the death toll in this earthquake at 242,000, although 
unofficial reports placed it as high as 655,000. 

In order to thoroughly test for the occurrence of direct 
precursors the United States Geological Survey (USGS) ini- 
tiated the Parkfie ld (California) Earthquake Prediction Ex - 
perimentin 1985 (iBakun and LindhIll985 >.'Kanamorii l2o"o3l) . 
Earthquakes on this section of the San Andreas had occurred 
in 1857, 1881, 1901, 1922, 1934, and 1966. It was expected 
that the next earthquake in this sequence would occur by the 
early 1990's, and an extensive range of instrumentation was 
installed. The next earthquake in the sequence finally oc- 
curred on 28 September 2004. No precursory phenomena 
were observed that were significantly above the background 
noise level. Although the use of empirical precursors cannot 
be ruled out, the future of those approaches does not appear 
to be promising at this time. 



3 Statistical and statistical physics approaches 

A variety of studies have utilized variations in seismicity over 
relatively large distances to forecast future earthquakes. The 
distances are large relative to the rupture dimension of the 
subsequent earthquake. These approaches are based on the 
concept that the earth ' s crus t is an activated thermodynamic 
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behavior is the continuous level of background seismicity 
in all seismographic areas. About a million magnitude two 
earthquakes occur each year on our planet. In southern Cal- 
ifornia about a thousand magnitude two earthquakes occur 
each year. Except for the aftershocks of large earthquakes, 
such as the 1992 M=7.3 Landers earthquake, this seismic 
activity is essentially constant over time. If the level of back- 
ground seismicity varied systematically with the occurrence 
of large earthquakes, earthquake forecasting would be rela- 
tively easy. This, however, is not the case. 

There is increasing evidence that there are systematic pre- 
cursory variations in some aspects of regional seismicity. 
For example, it has been observed that there is a system- 
atic variation in the number of magnitude M=3 and larger 
earthquakes prior to at least some magnitude M=5 and 
larger earthquakes, and a systematic variation in the num- 
ber of magnitude M=5 and larger earthquakes prior to some 
magnitude M=7 and larger earthquakes. The spatial re- 
gions associated with this phenomena tend to be relatively 
large, suggesting that an earthquake may resemble a phase 
change with an increase in the "correlation length" prior to an 
earthquake (Bowman e t al,,, 1998; Jau me and Sykes. 199i. 
There have also been reports of anomalous quiescence in the 
source region prior to a large earthquake, a pattern that is 
often called a "Mogi Donut" (Mogi, 1985 ; Kanamori, 20011 
IWvss and HabermannIll988HWvssill997lK 

Many authors have noted the occurrence of a relatively 
large number of intermediate-sized earthquakes prior to a 
great earthquake. A specific example was the sequence 
of earthqua kes that p receded the 1906 San Francisco 
earthquake iik es and Ja ume. 1990). This seismic ac- 
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these studies has depended on knowing the location of the 
subsequent earthquake. 

A series of statistical algorithms to make intermediate term 
earthquake predictions have been developed by a Russian 
group under the directi on of V. I. Keilis-Borok using pattern 
recognition techniques jKeihs-Bo rok. 1990, 19§3). Seismic- 
ity in circular regions with diameters of 500 km was an- 
alyzed. Based primarily on seismic activation, earthquake 
alarms were issued for one or more regions, with the alarms 
generally lasting for three years. Alarms have been issued 
regularly since the mid 1980's and scored two notable suc- 
cesses: the prediction of the 1988 Armenian earthquake and 
the 1989 Loma Prieta earthquake. While a reasonably high 
success rate has been achieved, there have been some no- 
table misses including the recentM=9.0 Sumatra and M=8. 1 
Macquerie Island earthquakes. 

More recently, this group has used chains of premonitory 
earthquakes as the basis for iss uing alarms dShebalin et all 
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predicted the M=6.5, 22 December 2003 San Simeon (Cal- 
ifornia) earthquake and the M=8.1, 25 September 2003 
Tokachi-oki, (Japan) earthquake with lead times of six and 
seven months respectively. However, an alarm issued for 
southern California, valid during the spring and summer of 
2004, was a false alarm. 



4 Chaos and forecasting 

Earthquakes are caused by displacements on preexisting 
faults. Most earthquakes occur at or near the boundaries be- 
tween the near-rigid plates of plate tectonics. Earthquakes in 
California are associated with the relative motion between 
the Pacific plate and the North American plate. Much of 
this motion is taken up by displacements on the San An- 
dreas fault, but deformation and earthquakes extend from the 
Rocky Mountains on the east into the Pacific Ocean adjacent 
to California on the west. Clearly this deformation and the 
associated earthquakes are extremely complex. 

It is now generally accept ed that ea r thquak es are exam- 
ples of deterministic chaos llTiircottel Il997l) . Some au- 
thors ('Cielle r et all ll997*.'Geller'.'l99'A have argued that this 
chaotic behavior precludes the prediction of earthquakes. 
However, weather is also chaotic, but forecasts can be made. 
Weather forecasts are probabilistic in the sense that weather 
cannot be predicted exactly. One such example is the track 
of a hurricane. Probabilistic forecasts of hurricane tracks 
are routinely made; sometimes they are extremely accu- 
rate while at other times they are not. Another example of 
weather forecasting is the forecast of El Nino events. Fore- 
casting techniques based on pattern recognition and princi- 
ple components of the sea surface temperature fluctuation 
time series have been developed that are quite successful in 
forecasting future El Ninos, but again they are probabilis- 
tic in nature CChen et al. , 2004). It has also been argued 
dSvkes et al.L 1 19991) that chaotic behavior does not preclude 
the probabilistic forecasting of future earthquakes. Over the 
past five years our group has developed ( Rundle et al. , 2002 
Tiamno et all l2002blal: liundle et all Eooj ^ lHonidav et al.1 
2005*) a technique for forecasting the locations where earth- 
quakes will occur based on pattern informatics (PI). This type 
of approach has close links to principle component analysis, 
which has been successfully used for the forecasting of El 
Ninos. 



5 The PI method 

Seismic networks provide the times and locations of earth- 
quakes over a wide range of scales. One of the most sensitive 
networks has been deployed over southern California and the 
resulting catalog is readily available. Our objective has been 
to analyze the historical seismicity for anomalous behavior 
that would provide information on the occurrence of future 
earthquakes. At this point we are not able to forecast the 
times of future earthquakes with precision. However, our ap- 



are most likely to occur during a future time window. At the 
present time, this time window is typically taken to be ten 
years, although it appears that it is possible to utilize shorter 
time windows. 

Our approach divides the seismogenic region to be stud- 
ied into a grid of square boxes whose size is related to the 
magnitude of the earthquakes to be forecast. The rates of 
seismicity in each box are studied to quantify anomalous be- 
havior The basic idea is that any seismicity precursors rep- 
resent changes, either a local increase or decrease of seismic 
activity, so our method identifies the locations in which these 
changes are most significant during a predefined change in- 
terval. The subsequent forecast interval is the decadal time 
window during which the forecast is valid. The box size is se- 
lected to be consistent with the correlation length associated 
with accelerated seismic activity ( Bowman et al,. 1998), and 
the minimum earthquake magnitude considered is the lower 
limit of sensitivity and completeness of the network in the 
region under consideration. 

The detailed utilization of the PI method that we have used 
for earthquake forecasting is as follows: 

1 . The region of interest is divided into Nb square boxes 
with linear dimension Aa;. Boxes are identified by a 
subscript i and are centered at a;,;. For each box, there is 
a time series Ni{t), which is the number of earthquakes 
per unit time at time t larger than the lower cut-off mag- 
nitude Ale- The time series in box i is defined between 
a base time ti, and the present time t. 

2. All earthquakes in the region of interest with magni- 
tudes greater than a lower cutoff magnitude Mc are in- 
cluded. The lower cutoff magnitude is specified in 
order to ensure completeness of the data through time, 
from an initial time to to a final time t2- 

3. Three time intervals are considered: 

(a) A reference time interval from tf, to ti. 

(b) A second time interval from ti, to t2, t2 > ti. The 
change interval over which seismic activity changes 
are determined is then t2 — ti. The time tt, is cho- 
sen to lie between to and ti. Typically we take 
to = 1932, ti = 1990, and = 2000. The ob- 
jective is to quantify anomalous seismic activity in 
the change interval ti to t2 relative to the reference 
interval ti, to ti. 

(c) The forecast time interval t2 to ^3, for which the 
forecast is valid. We take the change and forecast 
intervals to have the same length. For the above 
example, = 2010. 

4. The seismic intensity in box i, Ii{ti,,t), between two 
times tb < t, can then be defined as the average num- 
ber of earthquakes with magnitudes greater than Mc that 
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interval <{, to t. Therefore, using discrete notation, we 
can write: 

1 * 

HU,t) = -— J2 N^{t'), (1) 

^ t'=t, 

where the sum is performed over increments of the time 
series, say days. 

5. In order to compare the intensities from different time 
intervals, we require that they have the same statistical 
properties. We therefore normalize the seismic intensi- 
ties by subtracting the mean seismic activity of all boxes 
and dividing by the standard deviation of the seismic ac- 
tivity in all boxes. The statistically normalized seismic 
intensity of box i during the time interval tt, to t is then 
defined by 

f(, ,^ U{h,t)- < u{h,t) > 

h{tb,t) = 7—77 , (2) 

where < /i(<f,, t) > is the mean intensity averaged over 
all the boxes and a{ti,,t) is the standard deviation of 
intensity over all the boxes. 

6. Our measure of anomalous seismicity in box i is the dif- 
ference between the two normalized seismic intensities: 

^ h{tbM) ~ h{tbM)- (3) 

7. To reduce the relative importance of random fluctua- 
tions (noise) in seismic activity, we compute the aver- 
age change in intensity, A/i(io, ti, t2) over all possi- 
ble pairs of normalized intensity maps having the same 
change interval: 



AI,{to,ti,t2) = > AI,{h,ti,t2), (4) 

ti - to /-^ 

tb — to 

where the sum is performed over increments of the time 
series, which here are days. 

8. We define the probability of a future earthquake in box 
i, Pi{to, ti,t2, ), as the square of the average intensity 
change: 



P^{to,t^,t2,) ^ Ah[hMM) ■ (5) 

9. To identify anomalous regions, we wish to compute the 
change in the probability Pi{to,ti,t2,) relative to the 
background so that we subtract the mean probability 
over all boxes. We denote this change in the probability 
by 

AP,{toMM) ^ P^{hMM)- < P^{toMM) >,(6) 



Hotspots are defined to be the regions where 
APi(to,ii, ^2) is positive. In these regions, Pi{to, ti, 12) is 
larger than the average value for all boxes (the background 
level). Note that since the intensities are squared in defining 
probabilities the hotspots may be due to either increases of 
seismic activity during the change time interval (activation) 
or due to decreases (quiescence). We hypothesize that 
earthquakes with magnitudes larger than Mc + 2 will occur 
preferentially in hotspots during the forecast time interval t2 

to tg. 

6 Applications of the PI method 

The PI method was first applied to seismicity in southern Cal- 
ifornia and adjacent regions (32° to 37° N lat, 238° to 245° E 
long). This region was divided into a grid of 3500 boxes with 
Ax = 0.1° (11 km). Consistent with the sensitivity of the 
southern California seismic network, the lower magnitude 
cutoff was taken to be M=3. The initial time was <o=1932, 
the change interval was from ti=1990 to t2=2000, and the 
forecast interval was from i2=2000 to t.s=2010 . The initial 
studies for California were published in 2002 (Rundl e et all 
I2OO2) . the results are reproduced in Figure [J The colored 
regions are the hotspots defined to be the boxes where AP is 
positive. This forecast of where earthquakes would likely 
occur was considered to be valid for the forecast interval 
from 2000 to 2010 and would be applicable for earthquakes 
with M=5 and larger Since 1 January 2000 seventeen earth- 
quakes with M>5 have occurred in the test region. These are 
also shown in Figure^ and information on these earthquakes 
is given in Tabled We consider the forecast to be successful 
if the epicenter of the earthquake li es within a hot spot box or 
in one of the eight adjoining boxes llMoorelll962l) . Fifteen of 
the seventeen earthquakes were successfully forecast. 

The second area to which the PI method was applied was 
Japan. The forecast hotspots for the Tokyo region (33° to 
38° N lat, 136° to 142° W long) are given in FigureEl The 
initial time was io=1965, and the change and forecast inter- 
vals were the same as those used for California. Between 1 
January 2000 and 14 October 2004, 99 earthquakes occurred 
and 91 earthquakes were successfully forecast. This forecast 
was presented at the International Conference on Geodynam- 
ics, 14-16 October 2004, Tokyo by one of the authors (JBR). 
Subsequently the Niigata earthquake (M=6.8) occurred on 
23 October 2004. This earthquake and its subsequent M>5 
aftershocks were successfully forecast. 

The PI method has also been applied on a worldwide ba- 
sis. In this case 1° x 1° boxes were considered. Ax = 1° 
(110 km). Consistent with the sensitivity of the global seis- 
mic network the lower magnitude cutoff was taken to be 
Mc = 5. The initial time was to=1965; the change and fore- 
cast intervals were the same as above. The resulting map 
of hotspots was presented by two of the authors (DLT and 
JRH) at the Fall Meeting of the American Geophysical Union 
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Fig. 1. Application of the PI method to southern California. Col- 
ored areas are the forecast hotspots for the occurrence of M>5 
earthquakes during the period 2000-2010 derived using the PI 
method. The color scale gives values of the log^f^{P/ Pmax). Also 
shown are the locations of the seventeen earthquakes with M>5 that 
have occurred in the region since 1 January 2000. Fifteen of the sev- 
enteen earthquakes were successfully forecast. More details of the 
earthquakes are given in TableQ 



Table 1. Earthquakes with M>5 that occurred in the California test 
region since 1 January 2000. Fifteen of these seventeen earthquakes 
were successfully forecast. The two missed events are marked with 
an asterisk. 





Event 


Magnitude 


Date 


1 


Big Bear I 


M=5.1 


10 Feb. 2001 


2 


Coso 


M=5.1 


17 July 2001 


3 


Anza I 


M=5.1 


32 Oct. 2001 


4 


Baja 


M=5.7 


22 Feb. 2002 


5 


Gilroy 


M=5.0 


13 May. 2002 


6 


Big Bear- II 


M=5.4 


22 Feb. 2003 


7 


San Simeon* 


M=6.5 


22 Dec. 2003 


8 


San Clemente Island* 


M=5.2 


15 June 2004 


9 


Bodie I 


M=5.5 


18 Sept. 2004 


10 


Bodie II 


M=5.4 


18 Sept. 2004 


11 


Parkfield I 


M=6.0 


18 Sept. 2004 


12 


Parkfield II 


M=5.2 


18 Sept. 2004 


13 


Arvin 


M=5.0 


29 Sept. 2004 


14 


Parkfield III 


M=5.0 


30 Sept. 2004 


15 


Wheeler Ridge 


M=5.2 


16 April 2005 


16 


Anza II 


M=5.2 


12 June 2005 


17 


Yucaipa 


M=5.0 


16 June 2005 




142' 



-4 



-3 



-2 

log(AP/AP^,J 



Fig. 2. Application of the PI method to central Japan. Colored ar- 
eas are the forecast hotspots for the occurrence of M>5 earthquakes 
during the period 2000-2010 derived using the PI method. The color 
scale gives values of the logj^Q(P/Pmaa;). Also shown are the lo- 
cations of the 99 earthquakes with M>5 that have occurred in the 
region since 1 January 2000. 



forecast of where earthquakes would occur was considered 
to be valid for the period 2000 to 2010 and would be ap- 
plicable for earthquakes with magnitudes greater than 7.0. 
Between 1 January 2000 and 14 December 2004 there were 
63 M>7 earthquakes worldwide; 55 of these earthquakes oc- 
curred within a hotspot or adjoining boxes. Subsequent to 
the meeting presentation, the M=8. 1 Macquarie Island earth- 
quake occurred on 23 December 2004 and the M=9.0 Suma- 
tra earthquake occurred on 26 December 2004. The epicen- 
ters of both earthquakes were successfully forecast. 

7 Forecast verification 

Previous tests of earthquake forecast s have emphasized 
the likelihood test (Kagan and Jackso nj. |200( : iRuridle et all 
l2002t iTiampo et all i2002bt .HoUidav et all \200^ . These 
tests have the significant disadvantage that they are overly 
sensitive to the least probable events. For example, con- 
sider two forecasts. The first perfectly forecasts 99 out of 
100 events but assigns zero probability to the last event. The 
second assigns zero probability to all 100 events. Under a 
likelihood test, both forecasts will have the same skill score 
of — cx). Furthermore, a naive forecast that assigns uniform 
probability to all possible sites will always score higher than 
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Fig. 3. World-wide application of the PI method. Colored areas are 
the forecast hotspots for the occurrence of M>7 earthquakes during 
the period 2000-2010 derived using the PI method. The color scale 
gives values of the logjQ(P/Pmaa;). Also shown are the locations 
of the sixty three earthquakes with M>7 that have occurred in the 
region since 1 January 2000. 

perior For this reason, UkeUhood tests are more subject to 
unconscious bias. 

An extensive review on forecast verification in the atmo- 
spheri c sciences has been given by Joliffee and Stephenson 
(120031) . The wide variety of approaches that they consider 
are directly applicable to earthquake forecasts as well. The 
earthquake forecasts considered in this paper can be viewed 
as binary forecasts by considering the events (earthquakes) 
as being forecast either to occur or not to occur in a given 
box. We consider that there are four possible outcomes for 
each box, thus two ways to classify each red, hotspot, box, 
and two ways to classify each white, non-hotspot, box: 

1 . An event occurs in a hotspot box or within the Moore 
neighborhood of the box (the Moore neighborhood is 
comprised of the eight boxes surrounding the forecast 
box). This is a success. 

2. No event occurs in a white non-hotspot box. This is also 
a success. 

3. No event occurs in a hotspot box or within the Moore 
neighborhood of the hotspot box. This is a false alarm. 

4. An event occurs in a white, non-hotspot box. This is a 
failure to forecast. 

We note that these rules tend to give credit, as successful 
forecasts, for events that occur very near hotspot boxes. We 
have adopted these rules in part because the grid of boxes is 
positioned arbitrarily on the seismically active region, thus 
we allow a margin of error of ±1 box dimension. In addi- 
tion, the events we are forecasting are large enough so that 
their source dimension approaches, and can even exceed, the 
box dimension meaning that an event might have its epicen- 
ter outside a hotspot box, but the rupture might then propa- 
gate into the box. Other similar rules are possible but we have 



The standard approach to the evaluation of a binary fore- 
cast is th e use of a r elative operating characteristic (ROC) di- 
agram (.Swets[ll973^ Mason. 2003). Standard ROC diagrams 
consider the fraction of failures-to-predict and the fraction of 
false alarms. This method evaluates the performance of the 
forecast method relative to random chance by constructing 
a plot of the fraction of failures to predict against the frac- 
tion of false alarms for an ensemble of forecasts. 'MolchanI 
i.l997J has used a modification of this method to evaluate the 
success of intermediate term earthquake forecasts. 

The binary approach has a long history, over 100 years, in 
the verification of tornado forecasts (Mason. 20_03 ). These 
forecasts take the form of a tornado forecast for a specific 
location and time interval, each forecast having a binary set 
of possible outcomes. For example, during a given time win- 
dow of several hours duration, a forecast is issued in which 
a list of counties is given with a statement that one or more 
tornadoes will or will not occur. A 2 x 2 contingency ta- 
ble is then constructed, the top row contains the counties in 
which tornadoes are forecast to occur and the bottom row 
contains counties in which tornadoes are forecast to not oc- 
cur. Similarly, the left column represents counties in which 
tornadoes were actually observed, and the right column rep- 
resents counties in which no tornadoes were observed. 

With respect to earthquakes, our forecasts take exactly this 
form. A time window is proposed during which the fore- 
cast of large earthquakes having a magnitude above some 
minimum threshold is considered valid. An example might 
be a forecast of earthquakes larger than M — 5 during a 
period of five or ten years duration. A map of the seis- 
mically active region is then completely covered ("tiled") 
with boxes of two types: boxes in which the epicenters of 
at least one large earthquake are forecast to occur and boxes 
in which large earthquakes are forecast to not occur In other 
types of forecasts, large earthquakes are given some contin- 
uous probabilit^_of_occun2nc^ from 0% to 100% in each 
box (iKagan and JacksoHbOOOl) . These forecasts can be con- 
verted to the binary type by the application of a threshold 
value. Boxes having a probability below the threshold are 
assigned a forecast rating of non-occurrence during the time 
window, while boxes having a probability above the thresh- 
old are assigned a forecast rating of occurrence. A high 
threshold value may lead to many failures to predict (events 
that occur where no event is forecast), but few false alarms 
(an event is forecast at a location but no event occurs). The 
level at which the threshold is set is then a matter of public 
policy specified by emergency planners, representing a bal- 
ance between the prevalence of failures to predict and false 
alarms. 

8 Binary earthquake forecast verification 

To illustrate this approach to earthquake forecast verification, 
we have constructed two types of retrospective binary fore- 
casts for California. The first type of forecast utilizes the PI 
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California and adjacent regions (32° to 38.3° N lat, 238° to 
245° E long) using a grid of boxes with Aa; — 0.1° and a 
lower magnitude cutoff Mc = 3.0. For this retrospective 
forecast we take the initial time to = 1932, the change in- 
terval ti = 1989 to t2 = 2000, and the forecast interval 
h, = 2 000 to t3 = 2010 ('Rund le et all 12002- 'Tiampo et al.L 
l2002bl) . In the analysis given above we considered regions 
with AP positive to be hotspots. The PI forecast under 
the above conditions with AP > is given in Figure]^. 
Hotspots include 127 of the 5040 boxes considered. This 
forecast corresponds to that given in Figure^ The threshold 
for hotspot activation can be varied by changing the thresh- 
old value for AP. A forecast using a higher threshold value 
is given in Figure Hotspots here include only 29 of the 
5040 boxes considered. 

An alternative approach to earthquake forecasting is to use 
the rate of occurrence of earthquakes in the past. We refer 
to this type of forecast as a relative intensity (RI) forecast. 
In such a forecast, the study region is tiled with boxes of 
size 0.1° X 0.1°. The numberof earthquakes with magnitude 
M > 3.0 in each box down to a depth of 20 km is determined 
over the time period from to ~ 1932 to t2 — 2000. The RI 
score for each box is then computed as the total number of 
earthquakes in the box in the time period divided by the value 
for the box having the largest value. A threshold value in 
the interval [0, 1] is then selected. Large earthquakes having 
M > 5 are then considered possible only in boxes having an 
RI value larger than the threshold. The remaining boxes with 
RI scores smaller than the threshold represent sites at which 
large earthquakes are forecast to not occur The physical jus- 
tification for this type of forecast is that large earthquakes 
are considered most likely to occur at sites of high seismic 
activity. 

In order to make a direct comparison of the RI forecast 
with the PI forecast, we select the threshold for the RI fore- 
cast to give the same box coverage given for the PI forecast in 
Figures|4j\ and|^, i.e. 29 boxes and 127 boxes respectively. 
Included in all figures are the earthquakes with AI > 5 that 
occurred between 2000 and 2005 in the region under consid- 
eration. 

9 Contingency tables and ROC diagrams 

The first step in our generation of ROC diagrams is the con- 
struction of the 2x2 contingency table for the PI and RI 
forecast maps given in Figure |3 The hotspot boxes in each 
map represent the forecast locations. A hotspot box upon 
which at least one large future earthquake during the forecast 
period occurs is counted as a successful forecast. A hotspot 
box upon which no large future earthquake occurs during the 
forecast period is counted as an unsuccessful forecast, or al- 
ternately, a false alarm. A white box upon which at least one 
large future earthquake during the forecast period occurs is 
counted as a failure to forecast. A white box upon which no 
large future earthquake occurs during the forecast period is 
counted as a unsuccessful forecast of non-occurrence. 



(A) 



Pattern Informatics (PI) Relative Intensity (RI) 




(B) 



Pattern Informatics (PI) Relative Intensity (RI) 




-123' -122' -121' -120' -119' -118' -117' -116' -115'-123' -122' -121' -120' -119' -118' -117' -116' -115' 

Fig. 4. Retrospective application of the PI and RI methods for 
southern California as a function of false alarm rate. Red boxes 
are the forecast hotspots for the occurrence of -M > 5 earthquakes 
during the period 2000 to 2005. Also shown are the locations of the 
Af > 5 earthquakes that occurred in this region during the forecast 
period. In Figure|4l\, a threshold value was chosen such that F ~ 
0.005. In Figure|4|i, a threshold value was chosen such that F ~ 
0.021. 

Verification of the PI and RI forecasts proceeds in exactly 
the same was as for tornado forecasts. For a given number of 
hotspot boxes, which is controlled by the value of the prob- 
ability threshold in each map, the contingency table (see Ta- 
ble|2j is constructed for both the PI and RI maps. Values for 
the table elements a (Forecast=yes, Observed=yes), b (Fore- 
cast=yes, Observed=no), c (Forecast=no, Observed=yes), 
and d (Forecast=no, Observed=no) are obtained for each 
map. The fraction of colored boxes, also called the proba- 
bility of forecast of occurrence, is r = (a + b)/N, where the 
total number of boxes is N = a + b + c + d. The hit rate is 
H = a/(a + c) and is the fraction of large earthquakes that 
occur on a hotspot. Tht false alarm rate is P = b/{b + d) 
and is the fraction of non-observed earthquakes that are in- 
correctly forecast. 

To analyze the information in the PI and RI maps, the stan- 
dard procedure is to consider all possible forecasts together 
These are obtained by increasing P from (corresponding 
to no hotspots on the map) to 1 (all active boxes on the map 
are identified as hotspots). The plot of H versus P is the 
relative operating characteristic (ROC) diagram. Varying the 
threshold value for both the PI and RI forecasts, we have ob- 
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Table 2. Contingency tables as a function of false alarm rate. In 
Table|2j\, a threshold value was chosen such that F ~ 0.005. In 
Table|2^, a threshold value was chosen such that F ^ 0.021. 
(A) 

Pattern informatics (PI) forecast 



Forecast 


Observed 




Yes 


No 


Total 


Yes 


(a) 4 


(b) 25 


29 


No 


(c) 13 


(d) 4998 


5011 


Total 


17 


5023 


5040 


Relative intensity (RI) forecast 


Forecast 


Observed 




Yes 


No 


Total 


Yes 


(a) 2 


(b) 27 


29 


No 


(c) 14 


(d) 4997 


5011 


Total 


16 


5024 


5040 



(B) 



Pattern informatics (PI) forecast 



Forecast 


Observed 




Yes 


No 


Total 


Yes 


(a) 23 


(b) 104 


127 


No 


(c)9 


(d) 4904 


4913 


Total 


32 


5008 


5040 


Relative intensity (RI) forecast 


Forecast 


Observed 




Yes 


No 


Total 


Yes 


(a) 20 


(b) 107 


127 


No 


(c) 10 


(d) 4903 


4913 


Total 


30 


5010 


5040 



tained the values of H and F given in Figure |5j blue for the 
PI forecasts and red for the RI forecasts. The results coiTe- 
sponding to the maps given in Figure |3 and the contingency 
tables given in Table El are given by the filled symbols. The 
forecast with 29 hotspot boxes (Figure |5j\ and Table |2j\) 
has Fpi = 0.00498, Hpi = 0.235 and Fri = 0.00537, 
Hbj = 0.125. The forecast with 127 hotspot boxes (Fig- 
ure |5^ and Table |2^) has Fpi = 0.0207, Hpi = 0.719 
and Fpj = 0.0213, Hpj = 0.666. Also shown in Figure^ 
is a gain curve (green) defined by the ratio of Hpi{F) to 
Hri{F). Gain values greater than unity indicate better per- 
formance using the PI map than using the RI map. The hori- 
zontal dashed line coiTesponds to zero gain. From Figure|5lit 
can be seen that the PI approach outperforms (is above) the 
RI under many circumstances and both outperform a random 
map, where H = F,hy ?l large margin. 

10 Discussion 

The fundamental question is whether forecasts of the time 
and location of future earthquakes can be accurately made. It 

ic Qff*f»nf"f»H tliQt Inner f"f»rm liiiTQrH miinc t\f tlif» f»Ynf»f*t"f»H ruff* 




Fig. 5. Relative operating characteristic (ROC) diagram. Plot of 
hit rates, H, versus false alarm rates, F, for the PI forecast (blue) 
and RI forecast (red). Also shown is the gain ratio (green) de- 
fined as Hpi{F) / Hri{F). The filled symbols correspond to the 
threshold values used in Figure |4] and Table |2| solid circles for 29 
hotspot boxes and solid squares for 127 hotspot boxes. The hori- 
zontal dashed line corresponds to zero gain. 



of occurrence of earthquakes are reasonably accurate. But is 
it possible to do better? Are there precursory phenomena that 
will allow earthquakes to be forecast? 

It is actually quite surprising that immediate local precur- 
sory phenomena are not seen. Prior to a volcanic eruption, 
increases in regional seismicity and surface movements are 
generally observed. For a fault system, the stress gradually 
increases until it reaches the frictional strength of the fault 
and a rupture is initiated. It is certainly reasonable to hy- 
pothesize that the stress increase would cause increases in 
background seismicity and aseismic slip. In order to test 
this hypothesis the Parkfield Earthquake Prediction Experi- 
ment was initiated in 1985. The expected Parkfield earth- 
quake occurred beneath the heavily instrumented region on 
28 Sep tember 2004. No local precursory changes were ob- 
served ( lLindhll2005l) . 

In the absence of local precursory signals, the next ques- 
tion is whether broader anomalies develop, and in particular 
whether there is anomalous seismic activity. It is this ques- 
tion that is addressed in this paper. Using a technique that 
has been successfully applied to the forecasting of El Nino 
we have developed a systematic pattern informatics (PI) ap- 
proach to the identification of regions of anomalous seismic 
activity. Applications of this technique to California, Japan, 
and on a world-wide basis have successfully forecast the lo- 
cation of future earthquakes. It must be emphasized that this 
is not an earthquake prediction. It is a forecast of where 
future earthquakes are expected to occur during a relatively 
long time window of ten years. The objective is to reduce the 
possible future sites of earthquakes relative to a long term 
hazard assessment map. 

Examination of the ROC diagrams indicates that the most 
important and useful of the suite of forecast maps are those 

\x7it"li thp lf»Qct" niimhf»r r»f lir»tcnr»t hnYf»c i ^ tlincf* \x7itli cmQll 
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values of the false alarm rate, F. A relatively high propor- 
tion of these hotspot boxes represent locations of future large 
earthquakes, however these maps also have a larger number 
of failures-to-forecast. Exactly which forecast map(s) to be 
used will be a decision for pohcy-makers, who will be called 
upon to balance the need for few false alarms against the de- 
sire for the least number of failures-to-forecast. 

Finally, we remark that the methods used to produce the 
forecast maps described here can be extended and improved. 
We have found modifications to the procedures described in 
Section 5 that allow the PI map to substantially outperform 
the RI map as indicated by the respective ROC diagrams. 
These methods are based on the approach of: 1) starting with 
the RI map, and introducing improvements using the steps 
described for the PI method; and 2) introducing an additional 
averaging step. This new method and its results will be de- 
scribed in a future publication. 
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